Search CORE

224 research outputs found

Leveraging Large Language Models and Weak Supervision for Social Media data annotation: an evaluation using COVID-19 self-reported vaccination tweets

Author: Banda Juan M.
Tekumalla Ramya
Publication venue
Publication date: 12/09/2023
Field of study

The COVID-19 pandemic has presented significant challenges to the healthcare industry and society as a whole. With the rapid development of COVID-19 vaccines, social media platforms have become a popular medium for discussions on vaccine-related topics. Identifying vaccine-related tweets and analyzing them can provide valuable insights for public health research-ers and policymakers. However, manual annotation of a large number of tweets is time-consuming and expensive. In this study, we evaluate the usage of Large Language Models, in this case GPT-4 (March 23 version), and weak supervision, to identify COVID-19 vaccine-related tweets, with the purpose of comparing performance against human annotators. We leveraged a manu-ally curated gold-standard dataset and used GPT-4 to provide labels without any additional fine-tuning or instructing, in a single-shot mode (no additional prompting)

arXiv.org e-Print Archive

Solar Event Tracking with Deep Regression Networks: A Proof of Concept Evaluation

Author: Banda Juan M.
Sarker Toqi Tahamid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/11/2019
Field of study

With the advent of deep learning for computer vision tasks, the need for accurately labeled data in large volumes is vital for any application. The increasingly available large amounts of solar image data generated by the Solar Dynamic Observatory (SDO) mission make this domain particularly interesting for the development and testing of deep learning systems. The currently available labeled solar data is generated by the SDO mission's Feature Finding Team's (FFT) specialized detection modules. The major drawback of these modules is that detection and labeling is performed with a cadence of every 4 to 12 hours, depending on the module. Since SDO image data products are created every 10 seconds, there is a considerable gap between labeled observations and the continuous data stream. In order to address this shortcoming, we trained a deep regression network to track the movement of two solar phenomena: Active Region and Coronal Hole events. To the best of our knowledge, this is the first attempt of solar event tracking using a deep learning approach. Since it is impossible to fully evaluate the performance of the suggested event tracks with the original data (only partial ground truth is available), we demonstrate with several metrics the effectiveness of our approach. With the purpose of generating continuously labeled solar image data, we present this feasibility analysis showing the great promise of deep regression networks for this task.Comment: 8 pages, 5 figures, this has been submitted and accepted for publication at IEEE Big Data 2019 - SABID Worksho

arXiv.org e-Print Archive

Crossref

Estimation of vertical slip rate in an active fault-propagation fold from the analysis of a progressive unconformity at the NE segment of the Carrascoy Fault (SE Iberia)

Author: García Mayordomo Julián
Insúa Arévalo Juan Miguel
Martín Banda Raquel
Publication venue: EGU General Assembly
Publication date: 01/01/2017
Field of study

Many studies have dealt with the calculation of fault-propagation fold growth rates considering a variety of kinematics models, from limb rotation to hinge migration models. In most cases, the different geometrical and numeric growth models are based on horizontal pre-growth strata architecture and a constant known slip rate. Here, we present the estimation of the vertical slip rate of the NE Segment of the Carrascoy Fault (SE Iberian Peninsula) from the geometrical modeling of a progressive unconformity developed on alluvial fan sediments with a high depositional slope. The NE Segment of the Carrascoy Fault is a left-lateral strike slip fault with reverse component belonging to the Eastern Betic Shear Zone, a major structure that accommodates most of the convergence between Iberian and Nubian tectonics plates in Southern Spain. The proximity of this major fault to the city of Murcia encourages the importance of carrying out paleosismological studies in order to determinate the Quaternary slip rate of the fault, a key geological parameter for seismic hazard calculations. This segment is formed by a narrow fault zone that articulates abruptly the northern edge of the Carrascoy Range with the Guadalentin Depression through high slope, short alluvial fans Upper-Middle Pleistocene in age. An outcrop in a quarry at the foot of this front reveals a progressive unconformity developed on these alluvial fan deposits, showing the important reverse component of the fault. The architecture of this unconformity is marked by well-developed calcretes on the top some of the alluvial deposits. We have determined the age of several of these calcretes by the Uranium-series disequilibrium dating method. The results obtained are consistent with recent published studies on the SW segment of the Carrascoy Fault that together with offset canals observed at a few locations suggest a net slip rate close to 1 m/ka

Docta Complutense

Digital.CSIC

Análisis del desempeño de algoritmos basados en la teoría de campo medio para problemas tipo mochila.

Author: Banda Moreno Juan Antonio
Publication venue
Publication date: 01/06/2018
Field of study

Se propone una metodología basada en teorías de campo medio para resolver problemas tipo mochila con funciones objetivo lineales y cuadráticas a gran escala. Además, se consideran problemas desde una hasta treinta restricciones lineales. Estos problemas son conocidos en la literatura como el problema de la mochila, el problema de la mochila cuadrática y el problema de la mochila multidimensional. Fueron seleccionados por su sencilla interpretación y múltiples aplicaciones en la vida real. Asimismo, en los dos primeros problemas, se toman casos en los que se sabe que dado el algoritmo exacto no es conveniente su implementación. Para el tercer problema simplemente se toman los casos más usados para validar la eficiencia de algoritmos, casos en los que el valor ´optimo es desconocido para algunos tipos. La esencia de la metodología propuesta es encontrar una función de distribución de probabilidad asociada a un problema de optimización. Una de las más usadas es la distribución de Boltzmann que involucra la función objetivo y sus restricciones, mediante la relajación Lagrangiana, transformando un problema discreto en uno continuo. Sin embargo, la distribución por si sola es compleja y difícil de tratar, por lo que se realiza una aproximación de campo medio que resulta de elegir de un conjunto de distribuciones sencillas, aquella que ofrezca la menor diferencia entre la distribución de Boltzmann y ´esta. Los problemas de optimización usados para validar la eficiencia de la metodología propuesta son binarios por lo que la distribución general de campo medio que se plantea es adecuada para este tipo. En dado caso en el que se quiera utilizar esta metodología en otro tipo de problemas, es necesario presentar otra distribución de campo medio que se ajuste a ellos. El enfoque de campo medio usado en el presente trabajo permite encontrar ecuaciones independientes que estiman la probabilidad de ocurrencia de cada una de las variables a través del espacio dual; es decir, dando valores a los multiplicadores de LaGrange, es posible construir un vector de probabilidades en el que cada elemento representa la probabilidad de activar una determinada variable de una solución del problema binario. El algoritmo propuesto es determinista y capaz de encontrar soluciones de alta calidad en los problemas de prueba, con tiempos de ejecución cuyos ´ordenes de magnitud son inferiores a algoritmos recientemente estudiados. Objetivos y método de estudio: ´ Distinguir e identificar las bondades de utilizar un modelo probabilístico de campo medio, en problemas tipo mochila, para la construcción de soluciones factibles. Para ello, se parte de que cualquier problema de optimización está relacionado con la distribución de probabilidad de Boltzmann la cual es aproximada por una distribución mucho más sencilla. Teniendo la distribución aproximada es posible construir una solución binaria mediante técnicas de redondeo. CONTRIBUCIONES y CONCLUSIONES: Se logra obtener una metodología rápida y eficaz para construir soluciones factibles en problemas de gran escala de tipo mochila. Se abordan problemas con restricciones lineales, funciones objetivo cuadráticas y lineales, e inclusive problemas con múltiples restricciones. En todos estos casos se encuentran soluciones de calidad en poco tiempo, en promedio conforme crece su tamaño la diferencia entre lo mejor conocido y la solución de la metodología propuesta tiende a disminuir. Esto último es debido a que la teoría de campo medio, como su nombre lo indica, trabaja con un esquema de promedios por lo que a medida que crece el número de variables las soluciones tienden a ser más precisas

Repositorio Academico Digital UANL

Pulse of the Pandemic: Iterative Topic Filtering for Clinical Information Extraction from Social Media

Author: Banda Juan M.
Kumar Dheekshita
Sivaraman Venkatesh
Sontag David
Wu Julia
Publication venue
Publication date: 28/06/2021
Field of study

The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. We present an unsupervised, iterative approach to mine clinically relevant information from social media data, which begins by heuristically filtering for HCP-authored texts and incorporates topic modeling and concept extraction with MetaMap. This approach identifies granular topics and tweets with high clinical relevance from a set of about 52 million COVID-19-related tweets from January to mid-June 2020. We also show that because the technique does not require manual labeling, it can be used to identify emerging topics on a week-to-week basis. Our method can aid in future public-health emergencies by facilitating knowledge transfer among healthcare workers in a rapidly-changing information environment, and by providing an efficient and unsupervised way of highlighting potential areas for clinical research.Comment: 24 pages, 5 figures. To be published in the Journal of Biomedical Informatic

arXiv.org e-Print Archive

DSpace@MIT